Distributional Measures of Semantic Distance: A Survey
نویسندگان
چکیده
The ability to mimic human notions of semantic distance has widespread applications. Some measures rely only on raw text (distributional measures) and some rely on knowledge sources such as WordNet. Although extensive studies have been performed to compare WordNet-based measures with human judgment, the use of distributional measures as proxies to estimate semantic distance has received little attention. Even though they have traditionally performed poorly when compared to WordNet-based measures, they lay claim to certain uniquely attractive features, such as their applicability in resource-poor languages and their ability to mimic both semantic similarity and semantic relatedness. Therefore, this paper presents a detailed study of distributional measures. Particular attention is paid to flesh out the strengths and limitations of both WordNet-based and distributional measures, and how distributional measures of distance can be brought more in line with human notions of semantic distance. We conclude with a brief discussion of recent work on hybrid measures.
منابع مشابه
Measuring Semantic Distance using Distributional Profiles of Concepts
Automatic measures of semantic distance can be classified into two kinds: (1) those, such as WordNet, that rely on the structure of manually created lexical resources and (2) those that rely only on co-occurrence statistics from large corpora. Each kind has inherent strengths and limitations. Here we present a hybrid approach that combines corpus statistics with the structure of a Roget-like th...
متن کاملLexical Chains Using Distributional Measures of Concept Distance
In practice, lexical chains are typically built using term reiteration or resource-based measures of semantic distance. The former approach misses out on a significant portion of the inherent semantic information in a text, while the latter suffers from the limitations of the linguistic resource it depends upon. In this paper, chains are constructed using the framework of distributional measure...
متن کاملAssessing the time course of the influence of featural, distributional and spatial representations during reading
What does semantic similarity between two concepts mean? How could we measure it? The way in which semantic similarity is calculated might differ depending on the theoretical notion of semantic representation. In an eyetracking reading experiment, we investigated whether two widely used semantic similarity measures (based on featural or distributional representations) have distinctive effects o...
متن کاملDistributional measures of concept-distance: A task-oriented evaluation
We propose a framework to derive the distance between concepts from distributional measures of word co-occurrences. We use the categories in a published thesaurus as coarse-grained concepts, allowing all possible distance values to be stored in a concept–concept matrix roughly .01% the size of that created by existing measures. We show that the newly proposed concept-distance measures outperfor...
متن کاملCross-Lingual Distributional Profiles of Concepts for Measuring Semantic Distance
We present the idea of estimating semantic distance in one, possibly resource-poor, language using a knowledge source in another, possibly resource-rich, language. We do so by creating cross-lingual distributional profiles of concepts, using a bilingual lexicon and a bootstrapping algorithm, but without the use of any sense-annotated data or word-aligned corpora. The cross-lingual measures of s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1203.1858 شماره
صفحات -
تاریخ انتشار 2012